3D cache-oblivious multi-scale traversals of meshes using 8-reptile polyhedra

نویسنده

  • H. Haverkort
چکیده

In the field of numerical simulation, a multi-scale grid and its traversal can have a huge impact on the cache-efficiency of the calculations performed. We present the Haverkort element traversal and refinement scheme which is based on the bisection of 8-reptile tetrahedra. We developed a stack-assignment scheme that allows the Haverkort traversal to use stacks for storing the input data, the output data, and the temporary data of vertices and elements. Our parallel plane approach to assign stacks to vertices gave solution that requires at most 8 stacks to store temporary vertex data for uniform and multi-scale refined grids independent of the refinement level. Furthermore 8 is also the lower bound for the number of stacks for our parallel plane approach. Additionally we show that a general lower bound exists of 5 temporary stacks for storing the vertex data during a traversal. The combination of the Haverkort traversal with our Constant Stack algorithm is cache-oblivious, suitable for multi-level cache hierarchies and maintains its performance for multi-level refined grids and allows for fast and efficient space-filling-curve-based (re)partitioning. The Constant Stack algorithm has a time complexity of O(1) for stackassignment and -access and can achieve a cache-miss ratio of less than 5.2% .For all of these reasons we expect that a constant-number-of-stacks solution can compete with and outperform numerical simulations using cache-optimization techniques based on loop blocking when running on CPUs or GPUs. We apply the approach found for obtaining a constant-number-of-stacks solution for the Haverkort element traversal and a refinement scheme was applied to two other suitable 8-reptile polyhedra. Traversal and refinement schemes are given for an 8-reptile bisected cube and an 8-reptile bisected triangular prism needing 9 and 7 stacks respectively. this page is intentionally left blank

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scanning and Traversing: Maintaining Data for Traversals in a Memory Hierarchy

We study the problem of maintaining a dynamic ordered set subject to insertions, deletions, and traversals of k consecutive elements. This problem is trivially solved on a RAM and on a simple two-level memory hierarchy. We explore this traversal problem on more realistic memory models: the cache-oblivious model, which applies to unknown and multi-level memory hierarchies, and sequential-access ...

متن کامل

A Fast Cache Oblivious Mesh Layout with Theoretical Guarantees

One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cacheoblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significan...

متن کامل

Cache-Oblivious Traversals of an Array’s Pairs

Cache-obliviousness is a concept first introduced by Frigo et al. in [1]. We follow their model and develop a cache-oblivious algorithm for traversing all pairs of elements in a one-dimensional array and prove that it is optimally cache-efficient. Though the the traversal is recursive in nature, we demonstrate how to implement the algorithm nonrecursively, and we give experimental results.

متن کامل

Optimal Oblivious Routing On D-Dimensional Meshes

In this work we consider deterministic oblivious k-k routing algorithms with buffer size O(k). Our main focus lie is the design of algorithms for dimensional n n meshes, d> 1. For these networks we present asymptotically optimal O(kpnd) step oblivious k-k routing algorithms for all k and d > 1.

متن کامل

Cache-Efficient Parallel Isosurface Extraction for Shared Cache Multicores

This paper proposes to revisit isosurface extraction algorithms taking into consideration two specific aspects of recent multicore architectures: their intrinsic parallelism associated with the presence of multiple computing cores and their cache hierarchy that often includes private caches as well as caches shared between all cores. Taking advantage of these shared caches require adapting the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016